Reverse Code Engineering RCE CD +sandman 2000

home *** CD-ROM | disk | FTP | other *** search

/ Reverse Code Engineering RCE CD +sandman 2000 / ReverseCodeEngineeringRceCdsandman2000.iso / RCE / Library / Manuels & Misc / Assembly / ASM-TUT.ZIP / CHAP15-3.DOC < prev next >

Wrap

Text File | 1990-07-19 | 30.8 KB | 830 lines

Chapter 15 - Subroutines 157 ________________________ What if both strings are in memory but we don't know which segments they are in? In that case, the calling subroutine needs to pass both the segment and the offset for both the from_string and the to_string. Let's do this in C. move_string ( from_string, to_string ) ; In C, this will pass the addresses of the arrays, not the arrays themselves. C, once again, pushes these variables in right to left order. If the compiler is set up to move both the segment and the offset, then the generated C code will be: mov ax, seg to_string push ax mov ax, offset to_string push ax mov ax, seg from_string push ax mov ax, offset to_string push ax call move_string add sp, 8 ; 4 pushes = 8 bytes On the 8086, the low two bytes are ALWAYS the offset and the high two bytes are ALWAYS the segment. Remember, SEG gets the segment starting address of the named variable. We will do the subroutine as a near routine. After setting up BP, we will have: to_string segment bp + 10 to_string offset bp + 8 from_string segment bp + 6 from_string offset bp + 4 old IP bp + 2 bp -> old bp bp + 0 In the subroutine we will have to move the segment and the offset for each pointer. Luckily for us, there are two 8086 instructions for doing this: LDS (load DS) loads the first two bytes into the named register and the next two bytes into DS. LES (load ES) loads the first two bytes into the named register and the next two bytes into ES. If we write: LES di, [bp + 8] then the 8086 will load the first two bytes (bp+8 and bp+9) into DI and the next two bytes (bp+10 and bp+11) into ES. This loads ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 158 ______________________ the offset and the segment in the same instruction. If we write: LDS si, [bp + 4] the 8086 will load the first two bytes (bp+4 and bp+5) into SI and the next two bytes (bp+6 and bp+7) into ES, loading both the offset and segment in one instruction. LDS and LES allow you to load the offset into any full arithmetic register (AX, BX, CX, DX, SI, DI, BP or SP) but you can't use AX, CX or DX as addressing registers, so it only makes sense to load BX, SI, DI and BP for use as pointers. The two strings will now be addressed by DS:SI and ES:DI. DS is SI's normal segment, so we don't need to do anything, but we need a segment override for ES:DI. Here is the code for a C subroutine: A C string ends with a 0 byte, that is with a byte having the numeric value 0. It can be any length, but we need to test each byte to find out if it is 0. Notice that we are using (and changing) both DS and ES this time, so we have to PUSH and POP them, just like other registers. ; - - - - - move_string proc near FROM_POINTER EQU [bp+4] TO_POINTER EQU [bp+8] push bp mov bp, sp PUSHREGS ds, es, ax, si, di lds si, FROM_POINTER les di, TO_POINTER move_loop mov al, [si] ; source to al mov es:[di], al ; al to destination inc si ; pointers to next byte inc di and al, al ; is al 0? jnz move_loop POPREGS ds, es, ax, si, di pop bp ret ; calling routine pops variables move_string endp ; - - - - - Basically, the only difference between this and the Pascal move is that (1) here we check for 0 and there we had an actual count, and (2) in Pascal we used "ret (4)" and here the calling routine does the adjustment. Chapter 15 - Subroutines 159 ________________________ DRAWING THE STACK Each time that we have used the stack we have drawn a picture of where everything is on the stack. In case you think that this is some trivial little learning technique, I'm telling you now that at the assembler level, if you are passing variables on the stack and you don't make a diagram on a piece of paper of where everything is, you are guaranteed to consistently reference things by the wrong address. ALWAYS make a paper diagram that includes BP, ALWAYS use EQU statements, and you'll avoid a lot of mistakes. It is now time to get more complex. Everything that follows is more advanced, so it requires some programming experience. Everything that follows is about C modules and recursion. If this gets too complicated or obscure, just skip to the summary at the end of the chapter. I am going to give you a sample subroutine in C and then show you where all the variables go. The following is a complete C file: /* a complete C file - - - - - - - - - - - */ int A, B ; static int C, D ; extern int E, F ; sample_routine ( G, H ) int G, *H ; { int I, J ; static int K, L ; A = I ; /* transfer the words around */ B = G ; J = E ; F = C ; D = K ; *H = I ; return ; } /* end of C file - - - - - - - - - - - - - */ The only thing this routine does is tranfer the words around. This is so you can see where things are stored and how they are accessed. If you don't know C, the only thing you really need to know here is that '*H' means that 'H' is the ADDRESS of an integer, not the integer itself, while '*H' is the actual integer that is being addressed. For those of you who DO know C, you need to know exactly what extern, static and static mean. extern means that it is in an The PC Assembler Tutor 160 ______________________ external file. The first 'static' (which is outside of any subroutine) means that the data is INTERNAL to the file; it won't be shared with other files. Variables A and B, which don't have the word 'static' are GLOBAL and will be shared with other files. The other 'static' (inside the subroutine) means that its address is fixed in memory while I and J, which are not 'static' have their addresses generated every time you call the program. Say what? Yes, that's correct, every time you call the program they can be in a different place. That's what allows recursion, and you'll see how that is implemented in a second. Here is the equivalent program in assembler: ; - - - - - START DATA BELOW THIS LINE PUBLIC A, B EXTRN E:WORD, F:WORD A dw ? B dw ? C dw ? D dw ? K dw ? L dw ? ; - - - - - END DATA ABOVE THIS LINE ; - - - - - START SUBROUTINE BELOW THIS LINE sample_routine proc near ADDRESS_OF_H EQU [bp + 6] G EQU [bp + 4] I EQU [bp - 2] J EQU [bp - 4] push bp mov bp, sp ; set up bp sub sp, 4 ; save space for I and J PUSHREGS ax, si mov ax, I ; A = I mov A, ax mov ax, G ; B = G mov B, ax mov ax, E ; J = E mov J, ax mov ax, C ; F = C mov F, ax mov ax, K ; D = K mov D, ax mov ax, I ; *H = I mov si, ADDRESS_OF_H mov [si], ax Chapter 15 - Subroutines 161 ________________________ POPREGS ax, si mov sp, bp ; readjust sp pop bp ret ; a C routine so calling routine pops sample_routine endp ; - - - - - END SUBROUTINE ABOVE THIS LINE This time the setup is a little longer. It is: push bp mov bp, sp sub sp, 4 PUSHREGS ax, si I and J are not in the data segment, so we need to make space for them somewhere, and we do it on the stack. We subtract 4 from sp to provide ourselves with 4 bytes, two for I and two for J. There are two EQU statements that say exactly where I and J will be. We also push AX and SI because we will use them. After this setup, we have: address of H bp + 6 G bp + 4 old IP bp + 2 bp -> old bp bp + 0 I bp - 2 J bp - 4 old ax bp - 6 sp -> old si bp - 8 We have created space for some variables below bp. This is temporary and will disappear when we leave the subroutine. We do our dummy calculations,{1} and then do the end adjustment. There are two ways to readjust sp before returning: add sp, 4 just adds the amount that we subtracted, so it winds up in the same place. But the place it winds up is ALWAYS the address of the old BP, and bp is now pointing to that, so mov sp, bp does exactly the same thing. ARE YOU REALLY DEMENTED? Yes, for those of you who are truly masochistic, we have - ta-dah! - the Towers of Hanoi in assembler language. ____________________ 1. And that's 'dummy' in more ways than one. The PC Assembler Tutor 162 ______________________ The Towers of Hanoi is a game with three posts and a number of disks which have incrementally smaller diameters. At the beginning of the game, all the disks are on post one, ordered by size with the smallest on top and the largest on the bottom. For five disks it looks like this: (1) (2) (3) X X X X X X XXXXX X X XXXXXXX X X XXXXXXXXX X X XXXXXXXXXXX X X XXXXXXXXXXXXX X X XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX The object is to wind up with all the disks on post three in the same order. The game has only one rule: You may never put a larger disk on top of a smaller disk. The general solution to this problem is that if you have posts A, B and C with N disks on post A and you want to move them to post C, first move (N-1) disks to post B, move the bottom disk from post A to post C, then move (N-1) disks from post B to post C. This works out to be the optimal recursive solution. Try it out with N = 1, N = 2 and N = 3. N = 4 is already complicated. I won't discuss the game further or how to get the solution. If you don't know about it, you can look it up in either Doug Cooper's "Oh! Pascal" or Robert Kruse's "Data Structures and Program Design". The fact that I need to resort to such an unususl example to illustrate recursion underlines the fundamental rule of recursion, which is: YOU SELDOM NEED TO USE RECURSION, BUT WHEN YOU NEED IT YOU REALLY REALLY NEED IT. First, here is the solution in C: /* the C solution - - - - - - - - - - - - - - - */ #define MAXIMUM 9 main () { int count ; while (1) { printf ("Enter a number less than 10.\n" ) ; scanf ( "%d", &count ) ; if ( count > MAXIMUM ) continue ; Chapter 15 - Subroutines 163 ________________________ towers_of_hanoi ( count, 1, 3, 2 ) ; } } /* - - - - - - - - - - - - - - - - - - - - - - - - - */ towers_of_hanoi ( count, from, to, via ) int count, from, to, via ; { if (count <= 0) return ; count-- ; towers_of_hanoi ( count, from, via, to ) ; printf ("Move a disk from %1d to %1d.\n" , from, to ) ; towers_of_hanoi ( count, via, to, from ) ; return ; } Notice that by letting a routine call itself, we have reduced it to just a few lines. And this is a problem that looks very complex. Here comes the assembler equivalent of the C code: ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE MAX_COUNT EQU 9 enter_message db "Enter a number less than 10", 0 make_a_move_message db "Move a disk from " from_byte db "X to " to_byte db "X.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE outer_loop: lea ax, enter_message ; count message call print_string call get_unsigned_byte ; count in al sub ah, ah ; zero ah for 'push ax' cmp al, MAX_COUNT ; too big? ja outer_loop ; move from post 1 to post 3 via post 2 mov bx, 2 ; post 2 = 'via' push bx mov bx, 3 ; post 3 = 'to' push bx mov bx, 1 ; post 1 = 'from' push bx push ax ; al = count, ah = 0 call towers_of_hanoi add sp, 8 ; adjust the stack jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE The PC Assembler Tutor 164 ______________________ towers_of_hanoi proc near VIA EQU [bp + 10] TO EQU [bp + 8] FROM EQU [bp + 6] COUNT EQU [bp + 4] push bp ; set up bp mov bp, sp push ax cmp BYTE PTR COUNT, 0 ; if no disks, we are done jbe exit dec BYTE PTR COUNT ; 1 less disk to move ; first half push TO push VIA push FROM push COUNT call towers_of_hanoi add sp, 8 ; adjust the stack ; print the message mov al, FROM ; get 'from' number add al, '0' ; convert to ascii mov from_byte, al ; put into message mov al, TO ; get 'to' number add al, '0' ; convert to ascii mov to_byte, al ; put into message lea ax, make_a_move_message call print_string ; second half push FROM push TO push VIA push COUNT call towers_of_hanoi add sp, 8 ; adjust the stack exit: pop ax pop bp ret towers_of_hanoi endp ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE The main routine checks that the number you enter is not too big and then calls 'towers_of_hanoi'. After setting up BP, the stack looks like this: VIA post bp + 10 TO post bp + 8 Chapter 15 - Subroutines 165 ________________________ FROM post bp + 6 count bp + 4 old IP bp + 2 bp -> old bp bp + 0 We make some EQU statements to define where each variable is and set up BP. We are using only one register, so we have a single PUSH instead of using PUSHREGS. We then check to see if the count is 0. If it is we are done. If not, we decrement the count by 1 which gives us 'count - 1' and divide the problem into three parts. Part 1 calls 'towers_of_hanoi' and moves 'count - 1' disks from 'from' to 'via'. Part 2 prints a message of where to move the bottom disk. It converts the post numbers into ascii and inserts them in the string where the 'X's are.{2} Part 3 calls towers_of_hanoi again, this time moving the 'count - 1' disks from 'via' to 'to'. Assemble and link it with ASMHELP. When you run it, start with just 1 disk, then 2 then 3 etc. If you use the maximum number (9), it will print 511 lines. For N disks, you need (2 ** N) - 1 moves, so this gets very big very fast. If you still feel no need to draw a picture of the stack or to use EQU statements, why don't you try writing this subroutine using the actual pointer values, i.e: push [bp + 6] and so on. See how long it takes and see how easy it is to read once it's done. Can you get it to work correctly? By the way, how large does the stack get? Well, if you raised the limit to 30 disks and entered 30, It would take your computer about a year, running 24 hours a day, to complete the solution (about 1 billion moves). The maximum stack size would be ((disks+1) * stack_use_per_disk). The extra 1 is for the calling routine. That is 31 * 14 bytes (including pushing IP, BP, and AX), or a mere 434 bytes. ____________________ 2 It uses the fact that for a data declaration, a variable name has the address of the first piece of data on that line. The PC Assembler Tutor 166 ______________________ SUMMARY To call a procedure you use CALL with the procedure name: call subroutine1 Procedures may be either FAR or NEAR. If there is a mismatch between which type the assembler thinks it is and which type it really is, there will be an error, either at the assembler level for internal subroutines or at the linker level for external subroutines. You may override the default type for procedures with PTR: call NEAR PTR subroutine5 call FAR PTR subroutine6 This is normally needed if the procedure comes after the call in the file and is not the default type. To allow other files to use a procedure you declare it PUBLIC within its own segment. PUBLIC subroutine1 To use a PUBLIC procedure from another file you declare it EXTRN, stating which type it is: EXTRN subroutine1:NEAR, subroutine2:FAR You should make this declaration in the code segment and you should make it before the procedure is referenced. A procedure is defined by giving a name followed by the word 'proc' (procedure) followed by either FAR or NEAR subroutine1 proc near subroutine2 proc far and a procedure is ended by giving the procedure name followed by 'endp' (end of procedure): subroutine2 endp One procedure must be ended before another is begun. Data is normally passed from one procedure to another on the stack: push variable1 push variable2 push variable3 call subroutine1 If this is done, then the called procedure references this data by using BP, the base pointer. The standard setup code is: Chapter 15 - Subroutines 167 ________________________ push bp ; save old bp mov bp, sp ; set bp to current top of stack What the stack looks like at this point depends on whether it is a near or a far procedure. For a near procedure, we have: variable1 bp + 8 variable2 bp + 6 variable3 bp + 4 old IP bp + 2 bp -> old BP bp + 0 For a far procedure we have: variable1 bp + 10 variable2 bp + 8 variable3 bp + 6 old CS bp + 4 old IP bp + 2 bp -> old BP bp + 0 Although it is theoretically possible to access these variables by their pointer definition: mov ax, [bp + 10] It is much less error prone and much clearer to use EQU statements: VAR1 EQU [bp + 10] mov ax, VAR1 If you are writing a recursive procedure and you need temporary variables, you can allot space on the stack for these variables: sub sp, 6 ; room for 6 bytes of temp. variables This should be done before any other pushes are done: push bp mov bp, sp sub sp, 6 PUSHREGS ax, bx, cx, dx These variables should also be named with EQU statements, and as always, you should draw a picture of what is on the stack: variable1 bp + 10 variable2 bp + 8 variable3 bp + 6 old CS bp + 4 old IP bp + 2 bp -> old BP bp + 0 VAR4 bp - 2 The PC Assembler Tutor 168 ______________________ VAR5 bp - 4 VAR6 bp - 6 VAR4 EQU [bp - 2] VAR5 EQU [bp - 4] VAR6 EQU [bp - 6] Data which is passed to the procedure is at a positive offset to BP while data that is created in the procedure is at a negative offset to BP. If you have created a data area for yourself on the stack, then you must eliminate it before leaving the procedure. There are two ways of doing this. One way is to add back what you have subtracted: POPREGS ax, bx, cx, dx add sp, 6 pop bp ret The other way is to give SP the value in BP because this is the place where SP will wind up anyway: POPREGS ax, bx, cx, dx mov sp, bp pop bp ret Use whichever one is clearer to you. When you return from a procedure that has had data passed to it, the data must be taken off the stack. There are two ways of doing this. The C standard is that it is the calling program's responsibility to do this: push variable1 push variable2 push variable3 call subroutine1 add sp, 6 ; 3 pushes = 6 bytes The Pascal standard is that you take them off the stack on the return. There is a special return instruction for that: ret (6) ; 3 pushes = 6 bytes DATA Data can be made available to other files and data can be accessed from other files. To make data available, declare it PUBLIC: PUBLIC variable1, variable2, variable3 Chapter 15 - Subroutines 169 ________________________ To access PUBLIC data from other files, use an EXTRN statement which includes the data type: EXTRN variable7:BYTE, variable8:WORD, variable9:DWORD EXTRN variable10:QWORD, variable11:TBYTE This EXTRN statement must be in a segment which has the same ASSUME segment register as will be used when accessing the data. Normally this is DS, but it can be something else. For instance, if the above EXTRN statements were in MORESTUFF and you have: ASSUME es:MORESTUFF then every time you access variable8: mov dx, variable8 the assembler will code an ES segment override. PUSHREGS and POPREGS When writing a subroutine, you should always save any registers that you use by pushing them. push ax push bx push cx They are then popped before returning pop cx pop bx pop ax In order to save a lot of lines of code, there are two macros, PUSHREGS and POPREGS. They are designed so you may use a word processor to copy them. PUSHREGS pushes in left to right order and POPREGS pops in right to left order: PUSHREGS ax, bx, cx, dx POPREGS ax, bx, cx, dx is a matched pair. LES and LDS LDS (load DS) loads the first two bytes into the named register and the next two bytes into DS. LES (load ES) loads the first two bytes into the named register and the next two bytes into ES. les si, [bp+6] lds di, [bp+10]